{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# <center>RegEx in Python</center>\n",
    "\n",
    "![](images/memes/meme11.jpeg)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Quantifiers\n",
    "\n",
    "**Quantifiers** are the mechanisms to define how a **character**, **metacharacter**, or **character set** can be **repeated**.\n",
    "\n",
    "Here is the list of 4 basic quantifers:\n",
    "\n",
    "<table style=\"border: 1px solid black; font-size:15px;\">\n",
    "<thead>\n",
    "    <th>Symbol</th>\n",
    "    <th>Name</th>\n",
    "    <th>Quantification of previous character</th>\n",
    "</thead>\n",
    "    \n",
    "<tbody>\n",
    "<tr>\n",
    "    <td>?</td>\n",
    "    <td>Question Mark</td>\n",
    "    <td>Optional (0 or 1 repetitions)</td>\n",
    "</tr>\n",
    "    \n",
    "<tr>\n",
    "    <td>*</td>\n",
    "    <td>Asterisk</td>\n",
    "    <td>Zero or more times</td>\n",
    "</tr>\n",
    "\n",
    "<tr>\n",
    "    <td>+</td>\n",
    "    <td>Plus Sign</td>\n",
    "    <td>One or more times</td>\n",
    "</tr>\n",
    "\n",
    "<tr>\n",
    "    <td>{n,m}</td>\n",
    "    <td>Curly Braces</td>\n",
    "    <td>Between n and m times</td>\n",
    "</tr>\n",
    "</tbody>\n",
    "</table>\n",
    "\n",
    "\n",
    "Let us go through different examples to understand them one by one.\n",
    "\n",
    "### Example 1\n",
    "\n",
    "Find all the matches for `dog` and `dogs` in the given text."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import re"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "txt = \"\"\"\n",
    "I have 2 dogs. One dog is 1 year old and other one is 2 years old. Both dogs are very cute! \n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "pattern = re.compile(\"dogs?\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['dogs', 'dog', 'dogs']"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pattern.findall(txt)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "I have 2 \u001b[43m\u001b[1mdogs\u001b[0m. One \u001b[43m\u001b[1mdog\u001b[0m is 1 year old and other one is 2 years old. Both \u001b[43m\u001b[1mdogs\u001b[0m are very cute! \n",
      "\n"
     ]
    }
   ],
   "source": [
    "from utils import highlight_regex_matches\n",
    "highlight_regex_matches(pattern, txt)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Example 2\n",
    "\n",
    "Find all filenames starting with `file` and ending with `.txt` in the given text."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "txt = \"\"\"\n",
    "file1.txt\n",
    "file_one.txt\n",
    "file.txt\n",
    "fil.txt\n",
    "file.xml\n",
    "file-1.txt\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "pattern = re.compile(\"file[\\w-]*\\.txt\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['file1.txt', 'file_one.txt', 'file.txt', 'file-1.txt']"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pattern.findall(txt)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\u001b[43m\u001b[1mfile1.txt\u001b[0m\n",
      "\u001b[43m\u001b[1mfile_one.txt\u001b[0m\n",
      "\u001b[43m\u001b[1mfile.txt\u001b[0m\n",
      "fil.txt\n",
      "file.xml\n",
      "\u001b[43m\u001b[1mfile-1.txt\u001b[0m\n",
      "\n"
     ]
    }
   ],
   "source": [
    "highlight_regex_matches(pattern, txt)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Example 3\n",
    "\n",
    "Find all filenames starting with `file` followed by 1 or more digits and ending with `.txt` in the given text."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "txt = \"\"\"\n",
    "file1.txt\n",
    "file_one.txt\n",
    "file09.txt\n",
    "fil.txt\n",
    "file23.xml\n",
    "file.txt\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "pattern = re.compile(\"file\\d+\\.txt\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['file1.txt', 'file09.txt']"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pattern.findall(txt)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\u001b[43m\u001b[1mfile1.txt\u001b[0m\n",
      "file_one.txt\n",
      "\u001b[43m\u001b[1mfile09.txt\u001b[0m\n",
      "fil.txt\n",
      "file23.xml\n",
      "file.txt\n",
      "\n"
     ]
    }
   ],
   "source": [
    "highlight_regex_matches(pattern, txt)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can use the curly brackets syntax here with these modifications:\n",
    "\n",
    "<table style=\"border: 1px solid black; font-size:15px;\">\n",
    "<thead>\n",
    "    <th>Syntax</th>\n",
    "    <th>Description</th>\n",
    "</thead>\n",
    "    \n",
    "<tbody>\n",
    "<tr>\n",
    "    <td>{n}</td>\n",
    "    <td>The previous character is repeated exactly n times.</td>\n",
    "</tr>\n",
    "    \n",
    "<tr>\n",
    "    <td>{n,}</td>\n",
    "    <td>The previous character is repeated at least n times.</td>\n",
    "</tr>\n",
    "\n",
    "<tr>\n",
    "    <td>{,n}</td>\n",
    "    <td>The previous character is repeated at most n times.</td>\n",
    "</tr>\n",
    "\n",
    "<tr>\n",
    "    <td>{n,m}</td>\n",
    "    <td>The previous character is repeated between n and m times (both inclusive).</td>\n",
    "</tr>\n",
    "</tbody>\n",
    "</table>\n",
    "\n",
    "### Example 4\n",
    "\n",
    "Find years in the given text.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "txt = \"\"\"\n",
    "The first season of Indian Premiere League (IPL) was played in 2008. \n",
    "The second season was played in 2009 in South Africa. \n",
    "Last season was played in 2018 and won by Chennai Super Kings (CSK).\n",
    "CSK won the title in 2010 and 2011 as well.\n",
    "Mumbai Indians (MI) has also won the title 3 times in 2013, 2015 and 2017.\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "pattern = re.compile(\"\\d{4}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['2008', '2009', '2018', '2010', '2011', '2013', '2015', '2017']"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pattern.findall(txt)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Example 5\n",
    "\n",
    "In the given text, filter out all 4 or more digit numbers."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "txt = \"\"\"\n",
    "123143\n",
    "432\n",
    "5657\n",
    "4435\n",
    "54\n",
    "65111\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [],
   "source": [
    "pattern = re.compile(\"\\d{4,}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['123143', '5657', '4435', '65111']"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "re.findall(pattern, txt)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Example 6\n",
    "\n",
    "Write a pattern to validate telephone numbers.\n",
    "\n",
    "Telephone numbers can be of the form: `555-555-5555`, `555 555 5555`, `5555555555`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [],
   "source": [
    "txt = \"\"\"\n",
    "555-555-5555\n",
    "555 555 5555\n",
    "5555555555\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [],
   "source": [
    "pattern = re.compile(\"\\d{3}[-\\s]?\\d{3}[-\\s]?\\d{4}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['555-555-5555', '555 555 5555', '5555555555']"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pattern.findall(txt)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![](images/memes/meme12.jpg)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
